Skip to content

refactor validators to remove duplicate files #5214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 10, 2025
Merged

Conversation

gvwilson
Copy link
Contributor

@gvwilson gvwilson commented Jun 4, 2025

This PR replaces #5173 by rebasing @bmaranville's changes on top of main. Please see #5173 for detailed notes about the motivation and changes. The important changes are in:

  • codegen/init.py

  • codegen/datatypes.py

  • codegen/validators.py

  • doc/python/marker-style.md

  • plotly/_subplots.py

  • plotly/basedatatypes.py

  • plotly/figure_factory/_annotated_heatmap.py

  • plotly/io/_templates.py

  • plotly/validator_cache.py

  • I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.

  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.

  • For a new feature, I have added documentation examples in an existing or
    new tutorial notebook (please see the doc checklist as well).

  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.

  • For a new feature or a change in behaviour, I have updated the relevant docstrings in the code to describe the feature or behaviour (please see the doc checklist as well).

@gvwilson gvwilson requested a review from emilykl June 4, 2025 13:41
@gvwilson gvwilson self-assigned this Jun 4, 2025
@gvwilson gvwilson mentioned this pull request Jun 4, 2025
5 tasks
| File      | Before (bytes) | After (bytes) |
| --------- | -------------: | ------------: |
| `.whl`    |       16265568 |       9447645 |
| `.tar.gz` |        7666222 |       6615305 |
@gvwilson gvwilson force-pushed the refactor-validators branch from f0fddc2 to 0184bd4 Compare June 4, 2025 13:59
@gvwilson gvwilson requested a review from alexcjohnson June 4, 2025 19:31
@gvwilson
Copy link
Contributor Author

gvwilson commented Jun 4, 2025

@alexcjohnson if you have time to look this over we'd be grateful for your feedback

@emilykl
Copy link
Contributor

emilykl commented Jun 5, 2025

@bmaranville The file plotly/validators/_validators.json needs to be flagged in the pyproject.toml so that it's included in the built package.

Looks like the correct way is to add validators/_validators.json to the list under [tool.setuptools.package-data].

@emilykl
Copy link
Contributor

emilykl commented Jun 5, 2025

Thanks for the PR @bmaranville ! Other than my comments above, looks good to me. The tests are passing and validation seems to be working as usual in the examples I've tried. (@alexcjohnson I'm curious whether you can see any possible pitfalls or edge cases here -- happy to do more testing of specific cases if you have concerns.)

@alexcjohnson
Copy link
Collaborator

@bmaranville nice work, this is a clever solution! Two small thoughts about it:

  • We should be able to reduce the whitespace significantly, to make the file smaller. Could even collapse it to a single line, I'm not sure there's much value keeping it human-readable, but maybe @emilykl has opinions about this. If we really want to optimize file size we could also change its structure a bit - like if every entry is a dict {params, superclass} this could be converted to a length-2 list. But the whitespace is the biggest piece of this.
  • json.load is pretty fast, ~41ms on my computer. But it's even faster if we just make this a Python file. In my quick test I just added:
true=True
false=False
null=None
v = 

to the beginning of the file to convert the JSON to Python, and then from validators._validators import v took only ~25ms. (If we do this for real we should be able to tweak the json.dump to output Python in the first place so we don't need to alias true, false, and null)

@emilykl the only thing I'm not 100% confident of is whether this has any type checking implications. I don't think so, I think the only type checking that matters is on graph_objs, not validators, but we do currently have if TYPE_CHECKING blocks in all the __init__.py files in validators, ensuring that type checkers will load them all. If you haven't done so already it's probably worth playing around on this branch a bit to convince ourselves that we don't lose anything important with this.

@gvwilson
Copy link
Contributor Author

@alexcjohnson 👍 on reducing whitespace, but I'd like to merge this one first, then combine it with #5218, and then tidy up the whitespace in the result - #5218 switches code formatting and checking from black to ruff, so if we're going to figure out how to configure a tool to stay silent about things like long lines, I'd like to do that once. I've created #5225 and will assign it to myself.

@gvwilson gvwilson force-pushed the refactor-validators branch from 48daf32 to a39e0e6 Compare June 10, 2025 14:08
-   fix black exclude parameter in pyproject.toml to properly exclude ONLY directories
-   reintroduce py38 in codegen/__init__.py
-   remove commented-out lines
-   add plotly/validators/_validators.json to pyproject.toml
-   regenerate code
@gvwilson gvwilson force-pushed the refactor-validators branch from a39e0e6 to 513337c Compare June 10, 2025 15:02
@gvwilson gvwilson merged commit 9dc08a1 into main Jun 10, 2025
10 checks passed
@emilykl
Copy link
Contributor

emilykl commented Jun 10, 2025

  • We should be able to reduce the whitespace significantly, to make the file smaller. Could even collapse it to a single line, I'm not sure there's much value keeping it human-readable, but maybe @emilykl has opinions about this.

@alexcjohnson I'm in favor of this. I don't think there's any particular reason to keep it human-readable, although it's also not a HUGE difference with compression. Looks like about 194 KB compressed with whitespace vs. 163 KB compressed without whitespace, from a quick test I did locally.

  • json.load is pretty fast, ~41ms on my computer. But it's even faster if we just make this a Python file. ... convert the JSON to Python, and then from validators._validators import v took only ~25ms. (If we do this for real we should be able to tweak the json.dump to output Python in the first place so we don't need to alias true, false, and null)

@alexcjohnson @gvwilson I'm in favor of this -- that's a big performance improvement. Something for a future PR? (as a bonus, then we wouldn't have to worry about including this file in the pyproject.toml, since Python files are included automatically.)

@emilykl the only thing I'm not 100% confident of is whether this has any type checking implications. I don't think so, I think the only type checking that matters is on graph_objs, not validators, but we do currently have if TYPE_CHECKING blocks in all the __init__.py files in validators, ensuring that type checkers will load them all. If you haven't done so already it's probably worth playing around on this branch a bit to convince ourselves that we don't lose anything important with this.

I'm not a heavy user of type checking so it's possible I'll miss something here. Syntax highlighting is working as expected with these changes. I turned on stricter Pylance type checking and haven't noticed any issues yet but I'll keep an eye out. Hopefully we can put these changes out in an RC to catch any issues before full release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants